Source Separation of Musical Instrument Sounds in Polyphonic Musical Audio Signal and Its Application
نویسندگان
چکیده
A change of music appreciation style from “listening to high fidelity (Hi-Fi) sounds” to “listening to preferred sounds” has emerged due to evolution of digital audio processing technology for the past years. Previously, many people enjoyed passive music appreciation: e.g., they buy CD and phonograph recordings or download mp3 audio files, set the disks or files to various media players, and hit the play button. For the moment, only musical experts with signal processing expertise can enjoy active music appreciation. To allow more music novices to enjoy active music appreciation, we developed a functional audio player, named INTER (INstrumenT EqualizeR). This player enables users to change the volume of each musical instrument part in an audio mixture such as commercial CD recordings. INTER requires musical audio signals in which each musical instrument performs solo. Solo performance of each musical instrument is not generally available. Therefore, these solo musical instrument performances must be separated from an audio mixture of the musical piece. In other words, sound source separation is mandatory. In the thesis, we focus on sound source separation that extracts all musical instrument sounds from polyphonic musical audio signal. Our goals are to design and implement a sound source separation method and to apply the method to a functional audio player which enables users to edit audio signals of existing musical pieces according to their preference, and query-by-example music information retrieval. Musical audio signals are usually polyphonic with 5 – 20 musical instruments and consist of both harmonic and inharmonic musical instrument sounds. Therefore, we tackle three technical issues in sound source separation for monaural polyphonic musical audio signal: (i) spectral modeling comprising harmonic and inharmonic instrument sounds, (ii) recognition of complex musical instrument sound mixture, and (iii) ensuring property of instrument to the spectral models. To solve the issue (i), we propose the integrated model that captures harmonic and inharmonic tone models. To solve the issue (ii), we propose a score-informed sound source separation. To solve the issue (iii), we propose a parameter estimation method iii Abstract using prior distributions of the timbre parameters. Chapter 3 presents a method for sound source separation based on maximum likelihood estimation for musical audio signals including both harmonic and inharmonic instrument sounds. We solve the issue (i) in this chapter. We define the integrated weighted mixture model consisting of harmonic and inharmonic models to represent the spectrogram of various musical instrument sounds. To decompose the magnitude spectrogram of the input audio mixture, we introduce spectral distribution functions to formulate the sound source separation problem and derive the optimal distribution function. Experimental evaluation results show that source separation performance improves by integrating the harmonic and inharmonic models. Chapter 4 presents methods to separate musical audio signals based on maximum A Posteriori estimation using the integrated harmonic and inharmonic models. For prior information, we use the musical score corresponding to the audio. We solve the issues (ii) and (iii) in this chapter. We use a musical score such as a standard MIDI file (SMF) to initialize the model parameters corresponding to onset time, pitch, and duration by using the score. We introduce two approaches of instrument timbre modeling: template sounds and prior distributions of the model parameters. Template sounds are sound examples that generated by playing back each musical note of the SMF on a MIDI sound module. We initialize the model parameters by adapting them to the template sounds and then separate the observed spectrogram as we described in Chapter 3. The template sounds constrain the model parameters for each musical sound. Prior distributions of the model parameters are trained from a musical instrument sound database. The prior distributions constrain the model parameters for each musical instrument. Experimental results show that the quality of separated sounds based on the prior distributions is better than ones based on the template sounds. Chapter 5 presents two applications that use sound source separation results. First, we describe INTER that allows users to control the volume of each instrument part within existing audio recordings in real time. Users can manipulate volume balance of the instruments and remix existing musical pieces. Second, we describe a Query-byExample (QBE) approach in music information retrieval that allows a user to customize query examples by directly modifying the volume of different instrument parts. Our QBE system first separates all instrument parts from the audio signal of a piece with the help of its musical score, and then it lets users remix these parts to change the acoustic features ivusing prior distributions of the timbre parameters. Chapter 3 presents a method for sound source separation based on maximum likelihood estimation for musical audio signals including both harmonic and inharmonic instrument sounds. We solve the issue (i) in this chapter. We define the integrated weighted mixture model consisting of harmonic and inharmonic models to represent the spectrogram of various musical instrument sounds. To decompose the magnitude spectrogram of the input audio mixture, we introduce spectral distribution functions to formulate the sound source separation problem and derive the optimal distribution function. Experimental evaluation results show that source separation performance improves by integrating the harmonic and inharmonic models. Chapter 4 presents methods to separate musical audio signals based on maximum A Posteriori estimation using the integrated harmonic and inharmonic models. For prior information, we use the musical score corresponding to the audio. We solve the issues (ii) and (iii) in this chapter. We use a musical score such as a standard MIDI file (SMF) to initialize the model parameters corresponding to onset time, pitch, and duration by using the score. We introduce two approaches of instrument timbre modeling: template sounds and prior distributions of the model parameters. Template sounds are sound examples that generated by playing back each musical note of the SMF on a MIDI sound module. We initialize the model parameters by adapting them to the template sounds and then separate the observed spectrogram as we described in Chapter 3. The template sounds constrain the model parameters for each musical sound. Prior distributions of the model parameters are trained from a musical instrument sound database. The prior distributions constrain the model parameters for each musical instrument. Experimental results show that the quality of separated sounds based on the prior distributions is better than ones based on the template sounds. Chapter 5 presents two applications that use sound source separation results. First, we describe INTER that allows users to control the volume of each instrument part within existing audio recordings in real time. Users can manipulate volume balance of the instruments and remix existing musical pieces. Second, we describe a Query-byExample (QBE) approach in music information retrieval that allows a user to customize query examples by directly modifying the volume of different instrument parts. Our QBE system first separates all instrument parts from the audio signal of a piece with the help of its musical score, and then it lets users remix these parts to change the acoustic features iv Abstract that represent the musical mood of the piece. Experimental results show that the shift was actually caused by the volume change in the vocal, guitar, and drum parts. Chapter 6 discusses the major contributions made by this study to different research fields, particularly to sound source separation and instrument sound representation. We also discuss issues that still remain to be resolved and future directions we wish to research. Chapter 7 concludes the thesis.that represent the musical mood of the piece. Experimental results show that the shift was actually caused by the volume change in the vocal, guitar, and drum parts. Chapter 6 discusses the major contributions made by this study to different research fields, particularly to sound source separation and instrument sound representation. We also discuss issues that still remain to be resolved and future directions we wish to research. Chapter 7 concludes the thesis.
منابع مشابه
Musical Instrument Detection Detecting instrumentation in polyphonic musical signals on a frame-by-frame basis
Knowledge of the instrumentation of a musical signal at any given time could be useful for major audio signal processing problems such as sound source separation and automated music transcription. Knowing which instruments are playing is a first step toward more intelligently designed solutions to these very important and largely unsolved challenges. So, in this paper, we attempt the problem of...
متن کاملOn the Use of Zero-crossing Rate for an Application of Classification of Percussive Sounds
We address the issue of automatically extracting rhythm descriptors from audio signals, to be eventually used in content-based musical applications such as in the context of MPEG7. Our aim is to approach the comprehension of auditory scenes in raw polyphonic audio signals without preliminary source separation. As a first step towards the automatic extraction of rhythmic structures out of signal...
متن کاملBayesian analysis of polyphonic western tonal music.
This paper deals with the computational analysis of musical audio from recorded audio waveforms. This general problem includes, as subtasks, music transcription, extraction of musical pitch, dynamics, timbre, instrument identity, and source separation. Analysis of real musical signals is a highly ill-posed task which is made complicated by the presence of transient sounds, background interferen...
متن کاملMusical Instrument Recognition in Polyphonic Audio Using Source-Filter Model for Sound Separation
This paper proposes a novel approach to musical instrument recognition in polyphonic audio signals by using a source-filter model and an augmented non-negative matrix factorization algorithm for sound separation. The mixture signal is decomposed into a sum of spectral bases modeled as a product of excitations and filters. The excitations are restricted to harmonic spectra and their fundamental ...
متن کاملMusical Genre Classification of Audio Data Using Source Separation Techniques
We propose a two-step, audio feature-based musical genre classification methodology. First, we identify and separate the various musical instrument sources in the audio signal, using the convolutive sparse coding algorithm. Next, we extract classification features from the separated signals that correspond to distinct musical instrument sources. The methodology is evaluated and its performance ...
متن کامل